Basic Statistics

Raw Counts

Name Value
Rows 609,459
Columns 56
Discrete columns 6
Continuous columns 50
All missing columns 0
Missing observations 0
Complete Rows 609,459
Total observations 34,129,704
Memory allocation 373.9 Mb

Percentages

Data Structure

Missing Data Profile

Univariate Distribution

Histogram

Bar Chart (by frequency)

## 5 columns ignored with more than 50 categories.
## textID: 12242 categories
## text: 12224 categories
## sel_text: 7657 categories
## ngram_text: 566065 categories
## dif_text: 472934 categories

Bar Chart (by jaccard)

## 5 columns ignored with more than 50 categories.
## textID: 12242 categories
## text: 12224 categories
## sel_text: 7657 categories
## ngram_text: 566065 categories
## dif_text: 472934 categories

QQ Plot

QQ Plot (by jaccard)

Correlation Analysis

## 5 features with more than 20 categories ignored!
## textID: 12242 categories
## text: 12224 categories
## sel_text: 7657 categories
## ngram_text: 566065 categories
## dif_text: 472934 categories

Principal Component Analysis

## 5 features with more than 50 categories ignored!
## textID: 12242 categories
## text: 12224 categories
## sel_text: 7657 categories
## ngram_text: 566065 categories
## dif_text: 472934 categories

Bivariate Distribution

Boxplot (by jaccard)

Scatterplot (by jaccard)